A matter of degree
Statistical evidence for causality combines observed data with a mathematical model of the world
Causal evidence varies in terms of complexity of math/assumptions: a matter of degree
Model-based inferences about causality depend on complex statistical models with many assumptions
Design-based inferences about causality use carefully controlled comparisons with simple, transparent models and assumptions
Whatever our approach…
do the assumptions needed to use this mathematical tool reasonably fit reality?
When we condition, we block specific backdoor paths that generate confounding by comparing cases that are similar on observable trait. We assume:
Conditioning and model dependence:
These models bring additional modelling assumptions:
On top of which: we always need to argue that we have blocked all backdoor paths
Instead of relying conditioning, we might choose more careful research designs:
Here, the structure of the comparison motivates an argument for independence of cause and potential outcomes. Rather than block specific confounding variables, we eliminate confounding due to a class of confounding variables.
Social and political theorists have frequently argued that media—by shaping perceptions of events in the world, exposing people to narrative frames—affects beliefs and behaviors.
[board]
Foos and Bischoff (2022) examine the effect of changing exposure to The Sun on anti-EU attitudes and voting in the UK.
We could simply compare attitudes about the EU in areas with greater and less readership of The Sun (or among people who read The Sun versus those that do not):
In 1989, ~100 fans of Liverpool FC died in stampede at a match:
Is there any way to make use of this event to learn about the effect of reading The Sun?
We could compare attitudes toward the EU in Liverpool before and after the boycott: this sometimes called an interrupted time series
Plug in the observed outcome before treatment for the counterfactual outcome after treatment: \(t=1\) is post-treatment, \(t=0\) is pre-treatment.
\[\tau_i = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]
Plugging in:
\[\widehat{\tau_i} = \underbrace{[Y_{i,t=1}(1) | D_i = 1]}_{\text{Liverpool post-1989, boycott}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\]
In order for interrupted time series to work, we must assume:
\[\overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}} = \color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}}\]
That in the absence of the treatment, outcomes of \(Y\) would not have changed from before to after treatment.
Assumptions imply none of the following occurred
SUPER IMPORTANT: If there is some other factor that changes over time and affects \(Y\), it can induce bias …
…EVEN IF IT DOES NOT CAUSE \(D\).
The comparison holds the unit constant before and after the event \(\to\) collider bias - generating dependencies between variables that move together over time.
A good example of how to do this persuasively:
What kind of causal effect are we estimating when we do before and after comparisons?
\[\begin{split}E[\tau_i | D_i = 1] = {} \frac{1}{n}\sum\limits^{n}_{i=1} & [Y_{i,t=1}(1) | D_i = 1] - \\ & \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\end{split}\]
Is this the average causal effect?
Before-after comparisons assume no other changes in outcomes over time, but it is almost always true that
\(\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1] \neq 0\)
i.e., counterfactually, in the absence of treatment \(D\), potential outcomes \(Y_i(0)\) are changing over time.
Observed pre-treatment outcomes not a good substitute for post-treatment counterfactual outcomes.
In our example: we don’t know how EU skepticism might have trended in Liverpool absent the boycott. We do know how EU skepticism in the rest of the UK trended absent the boycott.
We don’t know: \(\color{red}{\overbrace{[Y_{i,t=1}(0)|D_i = 1]}^{\text{Liverpool post-1989, no boycott}}} - \overbrace{[Y_{i,t=0}(0)|D_i = 1]}^{\text{Liverpool pre-1989, no boycott}}\)
We do know: \(\underbrace{[Y_{i,t=1}(0) | D_i = 0]}_{\text{UK post-1989, no boycott}} - \underbrace{[Y_{i,t=0}(0)|D_i = 0]}_{\text{UK pre-1989, no boycott}}\)
Differences-in-differences compares changes in the treated cases against changes in untreated cases.
We use the trends in the untreated cases to plug-in for the \(\color{red}{counterfactual}\) trends (absent treatment) in the treated cases
If we assume:
\[\{\overbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}^{\text{Treated counterfactual trend}}\} = \\ \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\]
Then we can plug-in the \(observed\) untreated group trend for the \(\color{red}{counterfactual}\) treated group trend.
This is the parallel trends assumption. It is equivalent to saying there are no time-varying confounding variables that differ between treated and untreated.
If it is true, we can do some simple algebra and find that
\([\tau_i | D_i = 1] = [Y_{i,t=1}(1) | D_i = 1] - \color{red}{[Y_{i,t=1}(0)|D_i = 1]}\)
\(\begin{equation}\begin{split}[\tau_i | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{\color{red}{Y_{i,t=1}(0) | D_i = 1]} - [Y_{i,t=0}(0)|D_i = 1]}_{\text{Treated counterfactual trend}}\}\end{split}\end{equation}\)
\(\begin{equation}\begin{split}[\widehat{\tau_i} | D_i = 1] = {} & \{\overbrace{[Y_{i,t=1}(1) | D_i = 1] - [Y_{i,t=0}(0) | D_i = 1]}^{\text{Treated observed trend}}\} - \\ & \{\underbrace{Y_{i,t=1}(0) | D_i = 0] - [Y_{i,t=0}(0)|D_i = 0]}_{\text{Untreated observed trend}}\}\end{split}\end{equation}\)
And this gives us the name:
This shows that the Boycott of the Sun reduced Euro Skepticism in Liverpool
Under the parallel trends assumption (untreated cases have the same trends as treated cases in the absence of treatment)
If parallel trends assumption holds, what kinds of confounding does this design eliminate?
What are examples of confounders held constant in Sun Boycott difference-in-differences?
In the newspaper example: what would be an example in which some other factor generates bias that is not solved by the difference-in-differences estimate?
Estimation:
\(Y_{it} = \beta_0 + \beta_1 \text{Treated}_i + \beta_2 \text{Post}_t + \beta_3 \text{Treated}_i \times \text{Post}_t\)
\(Y_{it} = \overbrace{\alpha_i}^{\text{dummies for each } i} + \underbrace{\alpha_t}_{\text{dummies for each } t} + \beta_3 \text{Treated}_i \times \text{Post}_t\)
How do we validate the parallel trends assumption?
Placebo Tests:
Staggered Treatment: Bacon-Goodman 2021
Multiple Treatments: https://arxiv.org/pdf/1803.08807.pdf
Continuous Treatment: https://psantanna.com/files/Callaway_Goodman-Bacon_SantAnna_2021.pdf
Covariates: even conditioning on time-varying confounders has risks
As we get away from simple DiD, assumptions and potential problems multiply, solutions get more complicated…
What was the effect of enlistment in the US Civil War on voting for the Republican party?
\(GOP_{ie} = \alpha_i + \alpha_e + \beta Enlist_i \times PostWar_e + \epsilon_y + \epsilon_i\)
Assumptions:
Caveats:
Address confounding in a different way:
Distinguishing “natural experiment” from experiments:
An observational study where causal inference comes from the design that draws on randomization.
Two approaches:
Decisions:
Assumptions:
Follows from Wald estimator for non-compliance:
For more information on using them in practice: Lal, Lockhart, Xu, and Zu 2023
Problems
Assumptions: